Optimize checked_ilog and pow when base is a power of two#147250
Optimize checked_ilog and pow when base is a power of two#147250Kmeakin wants to merge 5 commits into
checked_ilog and pow when base is a power of two#147250Conversation
8f19ce6 to
e5ca8cd
Compare
This comment has been minimized.
This comment has been minimized.
|
does this affect the codegen in practical circumstances? I would expect even a fairly weak optimizer to perform this optimization, making us hand-coding it irrelevant and potentially even harmful because we introduce more conditional logic to chew through. |
checked_ilog when base is a power of twochecked_ilog and pow when base is a power of two
It replaces a loop by some bit manipulations. Whether anyone actually calls either function with a compile-time known, power of 2 base is another question. But it can't hurt, since it is guarded by
No, in either function, LLVM is not able to see that the loop is performing a log/pow and apply the identities: https://godbolt.org/z/6vMsxc9Kh
They're guarded by |
2faa397 to
abb7f32
Compare
...then I'm kinda surprised! Nice catch. |
Would be good to have codegen tests to demonstrate what this is doing -- especially since that way there's a way for people to try removing the special cases later if they think that LLVM no longer needs them. |
abb7f32 to
9ab0f63
Compare
9ab0f63 to
1d0ac82
Compare
|
thanks for the codegen tests! |
1d0ac82 to
c84b99e
Compare
|
@rustbot ready |
|
@scottmcm ping? |
There was a problem hiding this comment.
The code here is looking good, but can you make sure there's normal runtime behaviour tests for it too? Notably, after the base conversation (that's fixed in the code) it made me think that that's not currently covered by any tests, so we should have something -- maybe just add to the # Examples that it returns None for base zero or one? (They seem like perfectly reasonable and helpful examples, in addition to giving coverage for this stuff.)
And maybe add some should_panic tests to ensure that the overflow checking is still correct for pow when overflow checks are enabled?
This comment has been minimized.
This comment has been minimized.
|
With the one nit resolved (assuming there's no reason to leave as-is), LGTM. @bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
if base == 2 ** k, then log(base, n) == log(2, n) / k
f23f1e5 to
f20dfa8
Compare
|
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
Increase test coverage to check all interesting edge cases and all variants.
`strict_pow` can be implemented in terms of `checked_pow`, `wrapping_pow` can be implemented in terms of `overflowing_pow`, and `pow` can be implemented in terms of `strict_pow` or `wrapping_pow`.
Copy the optimization that unrolls the loop from `pow` to `checked_pow` and `overflowing_pow`.
f20dfa8 to
6986ad2
Compare
This comment has been minimized.
This comment has been minimized.
if base == 2 ** k, then (2 ** k) ** n == 2 ** (k * n) == 1 << (k * n)
6986ad2 to
e845f15
Compare
|
ping @jhpratt? |
|
The last email I had received was the CI failure. I'll take a look when I can, but I'm busy in Europe at the moment. |
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
…=<try> Optimize `checked_ilog` and `pow` when `base` is a power of two
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (c0638d9): comparison URL. Overall result: ❌✅ regressions and improvements - please read:Benchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf. Next, please: If you can, justify the regressions found in this try perf run in writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 2.8%, secondary 1.6%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary 2.3%, secondary 2.4%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.0%, secondary 0.3%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 515.367s -> 512.478s (-0.56%) |
|
Hmm...those regressions are far beyond what would be considered significant (both statistically and by looking at the graphs). Any thoughts? |
View all comments
Optimize
checked_ilogandpowwhen the base is a power of two